9 research outputs found

    Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization

    Full text link
    In this work, we decouple the iterative bi-level offline RL (value estimation and policy extraction) from the offline training phase, forming a non-iterative bi-level paradigm and avoiding the iterative error propagation over two levels. Specifically, this non-iterative paradigm allows us to conduct inner-level optimization (value estimation) in training, while performing outer-level optimization (policy extraction) in testing. Naturally, such a paradigm raises three core questions that are not fully answered by prior non-iterative offline RL counterparts like reward-conditioned policy: (q1) What information should we transfer from the inner-level to the outer-level? (q2) What should we pay attention to when exploiting the transferred information for safe/confident outer-level optimization? (q3) What are the benefits of concurrently conducting outer-level optimization during testing? Motivated by model-based optimization (MBO), we propose DROP (design from policies), which fully answers the above questions. Specifically, in the inner-level, DROP decomposes offline data into multiple subsets, and learns an MBO score model (a1). To keep safe exploitation to the score model in the outer-level, we explicitly learn a behavior embedding and introduce a conservative regularization (a2). During testing, we show that DROP permits deployment adaptation, enabling an adaptive inference across states (a3). Empirically, we evaluate DROP on various tasks, showing that DROP gains comparable or better performance compared to prior methods.Comment: NeurIPS 202

    CEIL: Generalized Contextual Imitation Learning

    Full text link
    In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.Comment: NeurIPS 202

    Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed data. Although avoiding the time-consuming online interactions in RL, it poses challenges for out-of-distribution (OOD) state actions and often suffers from data inefficiency for training. Despite many efforts being devoted to addressing OOD state actions, the latter (data inefficiency) receives little attention in offline RL. To address this, this paper proposes the cross-domain offline RL, which assumes offline data incorporate additional source-domain data from varying transition dynamics (environments), and expects it to contribute to the offline data efficiency. To do so, we identify a new challenge of OOD transition dynamics, beyond the common OOD state actions issue, when utilizing cross-domain offline data. Then, we propose our method BOSA, which employs two support-constrained objectives to address the above OOD issues. Through extensive experiments in the cross-domain offline RL setting, we demonstrate BOSA can greatly improve offline data efficiency: using only 10\% of the target data, BOSA could achieve {74.4\%} of the SOTA offline RL performance that uses 100\% of the target data. Additionally, we also show BOSA can be effortlessly plugged into model-based offline RL and noising data augmentation techniques (used for generating source-domain data), which naturally avoids the potential dynamics mismatch between target-domain data and newly generated source-domain data

    RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

    Full text link
    Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed to survive, and can quickly acquire new ones, by composing fundamental skills with limited experience. Inspired by this, we propose a novel framework, named Robot Skill Graph (RSG) for organizing massive fundamental skills of robots and dexterously reusing them for fast adaptation. Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of massive dynamic behavioral skills instead of static knowledge in KG and enables discovering implicit relations that exist in be-tween of learning context and acquired skills of robots, serving as a starting point for understanding subtle patterns existing in robots' skill learning. Extensive experimental results demonstrate that RSG can provide rational skill inference upon new tasks and environments and enable quadruped robots to adapt to new scenarios and learn new skills rapidly

    Development and validation of machine learning based prediction model for postoperative pain risk after extraction of impacted mandibular third molars

    No full text
    Background: Predicting postoperative pain risk in patients with impacted mandibular third molar extractions is helpful in guiding clinical decision-making, enhancing perioperative pain management, and improving the patients’ medical experience. This study aims to develop a prediction model based on machine learning algorithms to identify patients at high risk of postoperative pain after tooth extraction. Methods: We conducted a prospective cohort study. Outpatients with impacted mandibular third molars were recruited and the outcome was defined as the NRS (Numerical Rating Scale) score of peak postoperative pain within 24 h after the operation ≥7, which is considered a high risk of postoperative pain. We compared the models built using nine different machine learning algorithms and conducted internal and time-series external validations to evaluate the model's predictive performances in terms of the area under the curve (AUC), accuracy, sensitivity, specificity, and F1-value. Results: A total of 185 patients and 202 cases of impacted mandibular third molar data were included in this study. Five modeling variables were screened out using least absolute selection and shrinkage operator regression, including physician qualification, patient self-reported maximum pain sensitivity, OHI–S–CI, BMI, and systolic blood pressure. The overall performance of the random forest model was evaluated. The AUC, sensitivity, and specificity of the prediction model built using the random forest method were 0.879 (0.861–0.891), 0.857, and 0.846, respectively, for the training set and 0.724 (0.673–0.732), 0.667, and 0.600, respectively, for the time series validation set. Conclusions: This study developed a machine learning-based postoperative pain risk prediction model for impacted mandibular third molar extraction, which is promising for providing a theoretical basis for better pain management to reduce postoperative pain after third molar extraction

    Troxerutin Reduces Kidney Damage against BDE-47-Induced Apoptosis via Inhibiting NOX2 Activity and Increasing Nrf2 Activity

    No full text
    2,2,4,4-Tetrabromodiphenyl ether (BDE-47), one of the persistent organic pollutants, seriously influences the quality of life; however, its pathological mechanism remains unclear. Troxerutin is a flavonoid with pharmacological activity of antioxidation and anti-inflammation. In the present study, we investigated troxerutin against BDE-47-induced kidney cell apoptosis and explored the underlying mechanism. The results show that troxerutin reduced renal cell apoptosis and urinary protein secretion in BDE-47-treated mice. Western blot analysis shows that troxerutin supplement enhanced the ratio of Bcl-2/Bax; inhibited the release of cytochrome c from mitochondria, the activation of procaspase-9 and procaspase-3, and the cleavage of PARP; and reduced FAS, FASL, and caspase-8 levels induced by BDE-47. In addition, troxerutin decreased the production of reactive oxygen species (ROS) and increased the activities of antioxidative enzymes. Furthermore, troxerutin blunted Nrf2 ubiquitylation, enhanced the activity of Nrf2, decreased the activity of NOX2, and ameliorated kidney oxidant status of BDE-47-treated mice. Together, these results confirm that troxerutin could alleviate the cytotoxicity of BDE-47 through antioxidation and antiapoptosis, which suggests that its protective mechanism is involved in the inhibition of apoptosis via suppressing NOX2 activity and increasing Nrf2 signaling pathway

    Dispersion and Polishing Mechanism of a Novel CeO2-LaOF-Based Chemical Mechanical Polishing Slurry for Quartz Glass

    No full text
    Quartz glass shows superior physicochemical properties and is used in modern high technology. Due to its hard and brittle characteristics, traditional polishing slurry mostly uses strong acid, strong alkali, and potent corrosive additives, which cause environmental pollution. Furthermore, the degree of damage reduces service performance of the parts due to the excessive corrosion. Therefore, a novel quartz glass green and efficient non-damaging chemical mechanical polishing slurry was developed, consisting of cerium oxide (CeO2), Lanthanum oxyfluoride (LaOF), potassium pyrophosphate (K4P2O7), sodium N-lauroyl sarcosinate (SNLS), and sodium polyacrylate (PAAS). Among them, LaOF abrasive showed hexahedral morphology, which increased the cutting sites and uniformed the load. The polishing slurry was maintained by two anionic dispersants, namely SNLS and PAAS, to maintain the suspension stability of the slurry, which makes the abrasive in the slurry have a more uniform particle size and a smoother sample surface after polishing. After the orthogonal test, a surface roughness (Sa) of 0.23 nm was obtained in the range of 50 × 50 μm2, which was lower than the current industry rating of 0.9 nm, and obtained a material removal rate (MRR) of 530.52 nm/min

    Isolation of infectious SARS-CoV-2 from urine of a COVID-19 patient

    No full text
    SARS-CoV-2 caused a major outbreak of severe pneumonia (COVID-19) in humans. Viral RNA was detected in multiple organs in COVID-19 patients. However, infectious SARS-CoV-2 was only isolated from respiratory specimens. Here, infectious SARS-CoV-2 was successfully isolated from urine of a COVID-19 patient. The virus isolated could infect new susceptible cells and was recognized by its’ own patient sera. Appropriate precautions should be taken to avoid transmission from urine
    corecore